Skip to content

Fix SimpleStatement.is_lwt(): detect LWT from CQL query string#784

Draft
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:feature/simple-statement-is-lwt
Draft

Fix SimpleStatement.is_lwt(): detect LWT from CQL query string#784
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:feature/simple-statement-is-lwt

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented Apr 1, 2026

Summary

SimpleStatement.is_lwt() always returns False (inherited from Statement base class, line 348). This means all LWT-aware optimizations are silently disabled for:

  1. Raw CQL users: session.execute(SimpleStatement("INSERT ... IF NOT EXISTS")) — LWT routing and retry policies never activate
  2. All cqlengine queries: cqlengine wraps every query in SimpleStatement at connection.py:347-349, losing any LWT information

This is particularly impactful because:

Fix

Added regex-based LWT detection to SimpleStatement.is_lwt() that matches CQL patterns:

Pattern Example
IF NOT EXISTS INSERT INTO t (a) VALUES (1) IF NOT EXISTS
IF EXISTS UPDATE t SET a=1 WHERE k=1 IF EXISTS
IF <column> UPDATE t SET a=1 WHERE k=1 IF a = 2

The regex is case-insensitive and handles multiline queries, extra whitespace, and tabs. The result is cached after the first call to avoid repeated regex matching.

This is a best-effort heuristicPreparedStatement continues to use the authoritative is_lwt flag from the server. The main known false positive is DDL queries like CREATE TABLE IF NOT EXISTS, which is harmless since DDL doesn't use token-aware routing.

Changes

  • cassandra/query.py: Added _LWT_PATTERN regex and SimpleStatement.is_lwt() override
  • tests/unit/test_query.py: Added SimpleStatementIsLwtTest class with 20 tests; updated existing BatchStatement test to use the new detection instead of a workaround subclass

Tests

20 new tests covering:

  • INSERT/UPDATE/DELETE IF [NOT] EXISTS (various cases)
  • Conditional updates/deletes (IF col = val, IF col != val, IF col > val)
  • Multiple conditions (IF a = 1 AND b = 2)
  • Case insensitivity (lowercase, mixed case)
  • Non-LWT queries (SELECT, INSERT/UPDATE/DELETE without IF)
  • DDL with IF NOT EXISTS (documented false positive)
  • Result caching
  • Multiline queries, extra whitespace, tab separation

All 26 tests in tests/unit/test_query.py pass. All 82 tests in tests/unit/test_policies.py pass.

Related

@Lorak-mmk
Copy link
Copy Markdown

All cqlengine queries: cqlengine wraps every query in SimpleStatement at connection.py:347-349, losing any LWT information

This sounds like a core of the problem - it should use prepared statements.

@Lorak-mmk
Copy link
Copy Markdown

SimpleStatement.is_lwt() always returns False (inherited from Statement base class, line 348). This means all LWT-aware optimizations are silently disabled for:

Imo it is working as intended. If someone needs performance (including routing optimizations like the ones for LWT) they should use prepared statements. Drivers don't parse query strings.

The main known false positive is DDL queries like CREATE TABLE IF NOT EXISTS, which is harmless since DDL doesn't use token-aware routing.

DDL is the main use case for non-prepared statements. If you use a statement multiple times, you should prepare it.

Also, a big false positive is a request with a string that contains IF NOT EXISTS, no?

@mykaul mykaul marked this pull request as draft April 1, 2026 10:41
@mykaul
Copy link
Copy Markdown
Author

mykaul commented Apr 1, 2026

All cqlengine queries: cqlengine wraps every query in SimpleStatement at connection.py:347-349, losing any LWT information

This sounds like a core of the problem - it should use prepared statements.

It is, but it's a separate issue, or that's what I thought.

@mykaul
Copy link
Copy Markdown
Author

mykaul commented Apr 1, 2026

SimpleStatement.is_lwt() always returns False (inherited from Statement base class, line 348). This means all LWT-aware optimizations are silently disabled for:

Imo it is working as intended. If someone needs performance (including routing optimizations like the ones for LWT) they should use prepared statements. Drivers don't parse query strings.

The main known false positive is DDL queries like CREATE TABLE IF NOT EXISTS, which is harmless since DDL doesn't use token-aware routing.

DDL is the main use case for non-prepared statements. If you use a statement multiple times, you should prepare it.

This is something we need to agree with - we don't make efforts to improve non-prepared statements performance. I'm OK with that.

Also, a big false positive is a request with a string that contains IF NOT EXISTS, no?

Yes, a known one. I can handle it, if needed.

@mykaul mykaul force-pushed the feature/simple-statement-is-lwt branch from 00b14ef to e8b3ee6 Compare April 1, 2026 11:21
SimpleStatement.is_lwt() always returned False, which meant LWT-aware
routing (TokenAwarePolicy Paxos leader routing) and retry policies
never activated for:
- Raw CQL queries via session.execute(SimpleStatement(...))
- All cqlengine queries (which wrap everything in SimpleStatement)

This adds regex-based LWT detection to SimpleStatement that matches:
- INSERT ... IF NOT EXISTS
- UPDATE/DELETE ... IF EXISTS
- UPDATE/DELETE ... IF <column> <op> <value> (conditional)
- UPDATE/DELETE ... IF "column" <op> <value> (quoted identifiers)
- BEGIN BATCH ... IF NOT EXISTS ... APPLY BATCH

DDL statements (CREATE/ALTER/DROP ... IF [NOT] EXISTS) are correctly
excluded by checking that the first word is a DML verb.

The result is cached after the first call. This is a best-effort
heuristic; PreparedStatement continues to get the authoritative
is_lwt flag from the server during PREPARE.

Also updates the existing BatchStatement LWT test to use the new
SimpleStatement.is_lwt() directly instead of a workaround subclass.
@mykaul mykaul force-pushed the feature/simple-statement-is-lwt branch from e8b3ee6 to 09ce230 Compare April 1, 2026 12:25
@mykaul
Copy link
Copy Markdown
Author

mykaul commented Apr 1, 2026

v2 — Rebased on current master (8e6c4d4e7). Changes since v1:

  1. Fixed DDL false positives: Added a DML verb check — is_lwt() now requires the first word of the query to be a DML verb (INSERT, UPDATE, DELETE, or BEGIN) before checking for IF. This eliminates false positives from DDL statements like CREATE TABLE IF NOT EXISTS, DROP TABLE IF EXISTS, CREATE INDEX IF NOT EXISTS, etc.

  2. Fixed quoted identifier false negative: The regex now correctly detects LWT in queries using quoted identifiers, e.g. UPDATE t SET x=1 WHERE id=1 IF "col" = 1.

  3. 30 unit tests covering: all LWT forms (INSERT IF NOT EXISTS, UPDATE IF EXISTS, UPDATE IF , DELETE IF EXISTS, DELETE IF , BEGIN BATCH with LWT), non-LWT DML, DDL with IF (6 DDL cases), quoted identifiers, whitespace/tab/multiline variations, caching behavior.

@Lorak-mmk
Copy link
Copy Markdown

This is something we need to agree with - we don't make efforts to improve non-prepared statements performance. I'm OK with that.

In my opinion we shouldn't. We will only introduce hacks making the code more complicated, and there will always be cases where the behavior differs from prepared statement. User wants performance, correct routing -> user needs prepared statement.
For simple statement, you are never going to be able to be token-aware anyway, so LWT won't be correctly routed. I don't see the benefit of is_lwt heuristic.

@mykaul
Copy link
Copy Markdown
Author

mykaul commented Apr 8, 2026

This is something we need to agree with - we don't make efforts to improve non-prepared statements performance. I'm OK with that.

In my opinion we shouldn't. We will only introduce hacks making the code more complicated, and there will always be cases where the behavior differs from prepared statement. User wants performance, correct routing -> user needs prepared statement.
For simple statement, you are never going to be able to be token-aware anyway, so LWT won't be correctly routed. I don't see the benefit of is_lwt heuristic.

Perhaps we should focus on improving cqlengine queries then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants